Modelling potential environmental impacts of science activity in Antarctica
Abstract
We use GPS data collected on a science expedition in Antarctica to estimate hiking functions for the speed at which humans traverse terrain differentiated by slope and by ground cover (moraines and rock). We use the estimated hiking functions to build weighted directed graphs as a representation of specific environments in Antarctica. From these we estimate using a variety of graph metrics—particularly betweennness centrality—the relative potential for human environmental impacts arising from scientific activities in those environments. We also suggest a simple approach to planning science expeditions that might allow for reduced impacts in these environments.
1 Introduction
Overview of Antarctic science: when and where, its intensity etc. Background on international treaties, etc.
Relevant findings as to human impacts in Antarctica. Note that in this environment even ‘leave only footprints’ is likely impacting the environment in significant ways.
Overview of sections ahead.
3 Data sources
3.1 Antarctic geospatial data
Geospatial data for Antarctica were obtained from sources referenced in Cox et al. (2023b), Cox et al. (2023a), and Felden et al. (2023). The study area was defined to be the Skelton and Dry Valleys basins, as defined by the NASA Making Earth System Data Records for Use in Research Environments (MEaSUREs) project (Mouginot and University Of California Irvine 2017) and shown in Figure 3 (a). The Skelton basin was included because while the expedition GPS data was ostensibly collected in the McMurdo Dry Valleys, it actually extends into that basin as shown in Figure 3 (b). Elevation data from the Reference Elevation Model of Antarctica (REMA) project Howat et al. (2022), and geology from GeoMAP Cox et al. (2023b) are shown in Figure 3 (c). The five largest areas of contiguous non-ice surface geology across the study area shown in Figure 3 (d) were chosen to be the specific sites for more detailed exploration using the methods set out in this paper. These range in size from around 320 to 2600 square kilometres.
3.2 GPS data from an expedition
FRASER: Timeline, devices used, and associated protocols for scientists while on site.
GPS data were processed to make them better suited for use in the estimation of hiking functions.
The first processing step was to confirm the plausibility of the data, particularly the device-generated speed distance between fixes, and elevations associated with fixes. The challenges of post-processing GPS data are well documented and relate to issues with GPS drift which can lead to estimated non-zero movement speeds as a result of noise in the signal. The raw GPS data included distance since last fix, speed, and elevation estimates and it was determined in all cases that the device generated results for these measurements were likely to be more reliable than post-processing the raw latitude-longitude fixes to calculate the values.
The second processing step was to remove fixes associated with faster movement on other modes of transport than walking. Wood et al. (2023) cite a number of previous works that base detection of trip segments based on recorded speeds. This method was trivially applicable to our data to a limited degree as scientists arrive at the expedition base camp and occasionally also travel on helicopters on trips to more remote experimental sites.
The third, more challenging processing step was to deal with sequences of fixes associated with non-purposeful movement when scientists were in or around base camp, at rest stops, or at experimental sites. Crude filters removed fixes with low recorded distances between fixes (less than 2.5 metres), high turn angles at the fix location (greater than 150°), and fixes recorded on ice-covered terrain, but this didi not clean the data sufficiently for further analysis. An additional filtering step was to count fixes (across all scientists) in square grid cells and remove all fixes in grid cells with more than 50 fixes.
This left one persistent concern: an over-representation of consecutive fixes recorded at exactly the same elevation, resulting in many fixes with estimated slopes of exactly 0, and leading to a clearly evident dip in estimated movement speeds at 0 slope (Figure 4 (a)). It is likely that these fixes are associated with GPS device drift, so a it was decided to remove all fixes where estimated slope was exactly 0. Figure 4 (b) shows the improvement in even a crudely estimated hiking function derived from local scatterplot (LOESS) smoothing. Note that such functions are likely overfitted and not used further in our analysis where we favour more easily parameterised functions such as those discussed in Section 2.1.
4 Methods and results
4.1 Hiking functions
We fit three alternative functional forms to the cleaned GPS data: exponential (Tobler 1993), Gaussian (following Irmischer and Clarke 2018), and Lorentz (following Campbell et al. 2019) using the Levenburg-Marquardt algorithm (Moré 1978) as provided by the nlsLM function in the minpack.lm R package (Elzhov et al. 2022). The raw data and fitted curves are shown in Figure 5.
The Lorentz function offers a marginal improvement in the model fit in comparison with the Gaussian function, while both are clearly better than the exponential form. However, the improvement offered by the Lorentz function over the Gaussian is marginal: residual standard error 1.489 vs 1.491 on Moraine, and 1.487 vs 1.488 on Rock, and inspection of the curves shows that estimated hiking speeds for the Gaussian functions are much closer to a plausible zero on very steep slopes. We therefore chose to adopt Gaussian hiking functions for the remainder of the present work.
In previous work researchers have applied a ground cover penalty cost to a base hiking function to estimate traversal times. We instead, as shown, estimate different hiking functions for the two ground cover types present. The peak speed on rock is attained on steeper downhill slopes than on moraines, perhaps indicative of the greater care required on downhill gravel slopes. Meanwhile the highest speeds on level terrain are attained on moraines.
Overplotting of the hiking functions including an additional model fitted to all the data, confirms that the fitted functions are sufficiently different to retain separate models for each ground cover (see Figure 6). Plotting both functions in the same graph makes clearer the difference in maximum speed and slope at maximum speed associated with each ground cover.
The estimatedhiking functions associated with the two landcovers are \[ \begin{array}{rcl} v_{\mathrm{moraine}} & = & 4.17\,\exp\left[{-\frac{(\theta+0.0526)^2}{0.236}}\right] \\ v_{\mathrm{rock}} & = & 3.76\,\exp\left[{-\frac{(\theta+0.119)^2}{0.365}}\right] \end{array} \tag{3}\]
where the different maximum speeds (in kilometres per hour) and slopes of maximum speed are apparent.
4.2 Landscapes as graphs
We developed R code (R Core Team 2024) to build graphs (i.e. networks) with hexagonal lattice structure and estimated traversal times for graph edges derived from our hiking functions. Graphs are manipulated as igraph package (Csárdi and Nepusz 2006; Csárdi et al. 2024) graph objects for further analysis. An important decision in constructing graphs is choice of the spacing of the hexagonal lattice, and also of the underlying DEM from which graph vertex elevations are derived. Given the extent of the study sites (see Figure 3 (d)) it was decided that a hexagonal lattice (see Figure 2) with hexagons equivalent in area to 100 metre square cells as appropriate. The hexagon centre to centre spacing of this lattice is given by \[
100\sqrt{\frac{2}{\sqrt{3}}}\approxeq 107.5\,\mathrm{metres}
\tag{4}\] Given this lattice resolution we interpolated vertex elevations from a 32m resolution DEM from the REMA project (Howat et al. 2022) by bilinear interpolation using the R terra package (Hijmans 2024). It would be straightforward to derive vertex elevations from a more detailed DEM if required.
Edge weights (i.e. estimated traversal costs) are assigned by calculating the slope of each directed edge, the estimated hiking speed for that slope, and thus finding how long it should take for an edge to be traversed. If the estimated traversal time of an edge in the nominal 100 metre lattice is greater than 30 minutes then it is removed from the graph, along with its ‘twin’ edge in the opposite direction. Removing edges in both directions is partly a practical matter as it simplifies the operation of many graph algorithms, but can also be justified on the basis that a slope steep enough to be a barrier to ascent is unlikely to be traversed when descending.
After removal of all such edges only the largest connected graph component is retained so that the resulting hiking network representation is fully connected with no isolated vertices unreachable from elsewhere in the network remaining. A map of the fifth largest study area’s hiking network is shown in Figure 7. This network includes 30,697 vertices and 174,798 directed edges. The largest of the five study sites (see Figure 3 (d)) results in a network containing almost a quarter of a million vertices and over 1.4 million directed edges. A hiking network can be used to explore many connectivity properties of the environment. For example, for a chosen origin point, a shortest path tree can be derived showing the route (Figure 7 (c)).
4.3 Betweenness centrality
The connectivity properties of a hiking graph can be described using a range of measures of graph structure and used to reveal the relative likelihood of different parts of a terrain being frequently traversed. The particular structural measure we focus on is betweenness centrality, which is explained below.
In a graph \(G=(V,E)\) a path \(P\) is an ordered sequence of vertices, \(P=\left(v_1,v_2,\ldots,v_n\right)\) such that each consecutive pair of vertices \(v_i\) and \(v_{i+1}\) is adjacent, i.e. connected by a directed edge \(e_{i,j}\). The length of a path is the number of edges it contains. The weighted length or cost of a path is the sum over its edges of a weight or cost associated with each edge, which we here denote \(w_{i,j}\), i.e., the length \(L\) of a path \(P\) is given by \[ L(P)=\sum_{i=1}^{n-1} w_{i,i+1} \tag{5}\]
The shortest path from \(v_i\) to \(v_j\) is the path \(P=\left(v_i,\dots,v_*,\ldots,v_j\right)\) starting at\(v_i\) and ending at \(v_i\) passing through intervening vertices \(\left(v_*\right)\) such that \(L(P)\) is minimised. In a regular lattice such as our hiking networks there are many shortest paths of equal length if the weight associated with each edge is equal, as it would be if we based it solely on the length of the edge between each pair of vertices. When we consider edge traversal costs derived from the slope of each edge and a hiking function, then the shortest path will be unique, or one of only a small number of possibilities of equal total cost.
Shortest paths are used to develop many measures of vertext centrality in graphs (Freeman 1978). One such measure of vertex centrality is betweenness centrality. The betweenness centrality of a vertex is the total number of times that vertex is on shortest paths between every other pair of vertices in the network. If we denote the number of shortest paths or geodesics between two vertices \(v_j\) and \(v_k\) by \(g_{jk}\), then each appearance of a vertex \(v_i\) on the shortest path between those vertices only contributes \(1/g_{jk}\) to the betweenness centrality of \(v_i\). Formally, if we denote the number of times vertex \(v_i\) appears on shortest paths between \(v_j\) and \(v_k\) by \(g_{jk}(v_i)\), then its betweenness \(b_i\) is given by \[ b_i=\sum_{k=1}^n\sum_{j=1}^n\frac{g_{jk}(v_i)}{g_{jk}}\forall i\neq j\neq k \tag{6}\] Betweenness centrality is particularly relevant to applications where we are interested in how important vertices are to movement across the graph. An edge betweenness centrality measure can also be calculated on similar principles, but is not considered further here, in part because visualization is challenging given the potential for different scores for the two directed edges between each pair of vertices. Vertex betweenness on the other hand yields a single value for each vertex.
As might be expected betweenness centralities are computationally demanding to calculate. The time complexity of early algorithms was \(\mathcal{O}(n^3)\), where \(n\) is the number of vertices in the graph (Brandes 2001). An implementation of Brandes’s algorithm (2001) with time complexity \(\mathcal{O}(nm + n^2\log n)\), where \(m\) is the number of edges in the graph, is provided in the igraph package (Csárdi et al. 2024). Even with this improvement in performance, in our application such computational complexity is a strong motivation for working at a nominal 100 metre resolution. Halving the resolution to 50 metres would increase the number of graph vertices four-fold, and lead to a substantial increases in the times taken to calculate betweenness centralities.
Results of betweenness centrality for our example hiking network are shown in Figure 8. ‘All-paths’ betweenness centralities can be standardised relative to a theoretical maximum of \((n-1)(n-2)\), no straightforward standardisation is possible for the radius-limited case. Since we are primarily interested in betweenness as a measure of the relative vulnerability of different locations to human impacts, we linearly rescaled betweenness scores in all cases with respect to the range of scores across the graph being analysed. The rescaled betweenness, \(b_i^\prime\) for vertex \(v_i\) is given by \[ b_i^\prime=\frac{b_i-b_{\min}}{b_{\max}-b_{\min}} \tag{7}\] This approach is also applicable to radius-limited betweenness scores (see Section 4.4).
When all-paths betweenness is calculated, bottlenecks between large sub-areas are strongly highlighted as is the case for the ‘inner bend’ of the relatively narrow neck of navigable terrain that connects the east and west sub-areas of this site. The other feature of this map is the identification of a distinctive structure of ‘arterial’ routes.
4.4 Betweenness centrality limited by radius
The igraph implementation of betweenness centrality provides an option to radius limit betweenness calculations, meaning that only paths shorter than a specified radius (expressed in cost units, here of time) are considered in counting the appearance of vertices on shortest paths. This approach is particularly useful for large graphs where the time complexity of calculating radius limited betweenness scales more favourably than cited above, because the number of shortest paths that must be identified is greatly reduced. We informally confirmed the intuition that for a given radius limit the time take to compute betweenness scales approximately linearly with the number of vertices in the graph, since the number of shortest paths local to each vertex, and on which it might be found is similar.
Results for a series of radius limits are shown in Figure 9. The results here are interesting. At very short radii (the 30 minutes case) many locally important paths are identified as potentially relatively heavily trafficked. As the radius increases the pattern emphasises potential paths that are important across wider areas, eventually tending toward the arterial structure of the all-paths case. We discuss these results more fully in Section 5.
4.5 Impact minimizing networks
In this section we explore developing the betweenness centrality for a limited set of locations, which might represent a base camp, rest sites, and experimental sites for a season’s expedition.
This is relatively straightforward and like radius limited betweenness involves limiting the shortest paths considered to only those among a chosen subset of vertices in the graph. One slight simplification we make for performance reasons is to weight every appearance of a vertex on a geodesic as equal, even if multiple equal length shortest paths exist between two vertices. In practice, this is unlikely to have much effect on the resulting estimates of relative betweenness, since with real-valued edge costs two paths of exactly equal cost are unlikely. An example result is shown in Figure 10. One refinement shown in the figure is that sites have been waited by relative importance so that a site such as an expedition base camp can be expected to originate more trips than other sites.
A further analytical possibility is to use the hiking network to develop a set of routes among the sites that might minimise total impact, by confining movement to a limited set of paths. This approach is consistent with work in ecology and park and wildlife management which suggests that path networks can reduce the impact of human movement in many environments (Cole 1995a; Cole 1995b; Kidd et al. 2015; Marion et al. 2016; Piscová et al. 2023; Tomczyk and Ewertowski 2023).
The approach taken is first to first determine the lengths of all the shortest paths among all the sites. Then a minimum spanning tree for the sites is calculated, which is a graph with minimum total edge length that connects all the sites (Prim 1957). One refinement here, which we do not account for is that algorithms for finding the minimum spanning tree of a directed graph (referred to a minimum-cost arborescence or optimal branching) are more complex (Korte and Vygen 2018), and not at present supported in igraph. While there is some asymmetry in paths between locations depending on the direction of travel, it is not extreme, and so we approximate a solution by finding the minimum spanning tree links (i.e. direct connections) among sites treating our network as undirected, and then determine paths over the ground to make the connections between sites based on the directed hiking network. The resulting path network for the same set of sites is shown in Figure 11.